Low-level Fusion of Audio and Video Feature for Multi-modal Emotion Recognition
نویسندگان
چکیده
Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turnor chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect. However, early fusion is known to be more effective in many other multimodal recognition tasks. We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification. This strategy also allows for a combined feature-space optimization which will be discussed herein. The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.
منابع مشابه
Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition
Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turnor chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of under...
متن کاملMulti-Modal Emotion Recognition Fusing Video and Audio
Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animatio...
متن کاملFusion Framework for Emotional Electrocardiogram and Galvanic Skin Response Recognition: Applying Wavelet Transform
Introduction To extract and combine information from different modalities, fusion techniques are commonly applied to promote system performance. In this study, we aimed to examine the effectiveness of fusion techniques in emotion recognition. Materials and Methods Electrocardiogram (ECG) and galvanic skin responses (GSR) of 11 healthy female students (mean age: 22.73±1.68 years) were collected ...
متن کاملVisual and Audio Aware Bi-Modal Video Emotion Recognition
With rapid increase in the size of videos online, analysis and prediction of affective impact that video content will have on viewers has attracted much attention in the community. To solve this challenge several different kinds of information about video clips are exploited. Traditional methods normally focused on single modality, either audio or visual. Later on some researchers tried to esta...
متن کاملSpatiotemporal Networks for Video Emotion Recognition
Our article presents an audio-visual based multi-modal emotion classification system. Considering the fact of deep learning approaches to facial analysis have recently demonstrated high performance, in our work, we use convolutional neural networks (CNNs) for emotion recognition in video, relying on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007